当许多松散相关的未标记数据可用并且稀缺标记的数据时,机器智能的范式从纯粹的监督学习转变为更实用的情况。大多数现有算法都假定基础任务分布是固定的。在这里,我们考虑了随着时间的推移,该任务分布中的一个更现实和具有挑战性的环境会不断发展。我们将这个问题称为半监督的元学习,并具有不断发展的任务分布,缩写为集合。在这种更现实的环境中出现了两个关键挑战:(i)在存在大量未标记的分发(OOD)数据的情况下,如何使用未标记的数据; (ii)如何防止由于任务分配转移而导致先前学习的任务分布的灾难性遗忘。我们提出了一种强大的知识和知识保留的半监督元学习方法(秩序),以应对这两个主要挑战。具体而言,我们的订单引入了一种新型的共同信息正则化,以使用未标记的OOD数据鲁棒化模型,并采用最佳的运输正规化来记住以前在特征空间中学习的知识。此外,我们在一个非常具有挑战性的数据集上测试我们的方法:大规模非平稳的半监督任务分布的集合,该任务分布由(至少)72K任务组成。通过广泛的实验,我们证明了拟议的订单减轻了忘记不断发展的任务分布,并且对OOD数据比相关的强基础更强大。
translated by 谷歌翻译
无任务持续学习(CL)旨在学习非平稳数据流,而无需明确的任务定义,不要忘记以前的知识。广泛采用的内存重播方法可能会逐渐对长数据流有效,因为该模型可能会记住存储的示例并过度拟合内存缓冲区。其次,现有方法忽略了内存数据分布的高不确定性,因为内存数据分布与所有先前数据示例的分布之间存在很大差距。为了解决这些问题,我们首次提出了一个原则的内存演进框架,以使内存缓冲区逐渐难以通过分布强大的优化(DRO)来动态发展内存数据分布。然后,我们得出了一个方法家族,以通过Wasserstein梯度流(WGF)在连续概率中进化内存缓冲区数据。所提出的DRO是W.R.T最糟糕的记忆数据分布,因此保证了模型性能,并且比现有基于内存重新播放的方法更加可靠的功能。对现有基准测试的广泛实验证明了拟议方法减轻遗忘的有效性。作为拟议框架的副产品,与现有的无任务CL方法相比,我们的方法对对抗性示例更强大。
translated by 谷歌翻译
基于深度神经网络的EEG解码系统已广泛用于大脑计算机接口(BCI)的决策制作。然而,在EEG信号中的显着方差和噪声,它们的预测可能是不可靠的。以前的eEG分析工作主要关注源信号中噪声模式的探索,而解码过程中的不确定性主要是未开发的。自动检测和量化这种解码不确定性对于诸如机器人手臂控制等的BCI电机图像等很重要。在这项工作中,我们提出了一个不确定性估计模型(UE-EEG),以探讨EEG解码过程中的不确定性,这考虑了输入信号中的不确定性和模型中的不确定性。采用模型不确定性估计的模型面向模型的模型方法,采用贝叶斯神经网络来建立输入数据的不确定性。该模型可以集成到电流广泛使用的深度学习分类器中,而无需改变架构。我们对两个公共电机图像数据集进行了对主题内部EEG解码和交叉对象eEG解码的不确定性估计进行了广泛的实验,其中拟议的模型实现了对估计不确定性的质量的显着改善,并演示了所提出的UE-EEG是一种有用的BCI应用程序的工具。
translated by 谷歌翻译
随着消费者显示和商业VR平台的兴起,虚拟现实(VR)变得无处不在。这样的显示器需要低潜伏期和高质量的合成成像,并减少了计算开销。神经渲染的最新进展显示出通过虚拟或物理环境的基于图像的表示,可以在3D计算机图形中解锁新的可能性。具体而言,神经辐射场(NERF)表明,可以实现光真逼真的质量和连续的3D场景变化,而不会丧失依赖观点的效果。尽管NERF可以显着利益VR应用程序的渲染,但它面临着高度视野,高分辨率和立体/中心观看的独特挑战,通常会导致渲染图像的低质量和高潜伏期。在VR中,这不仅会损害互动经历,还可能引起疾病。为了解决VR中的六级自由主义者和立体声的问题,我们介绍了第一个注视 - 矛盾的3D神经表示和视图合成方法。我们将视觉和立体声音的人类心理物理学融入了3D风景的以自我为中心的神经表示中。然后,我们共同优化了延迟/性能和视觉质量,同时相互弥合人类的感知和神经场景合成以实现感知高质量的沉浸式相互作用。我们进行了客观分析和主观研究,以评估方法的有效性。我们发现,我们的方法显着降低了潜伏期(与NERF相比,时间降低了99%),而不会损失高保真渲染(感知上与完整的地面真相相同)。提出的方法可以作为迈向未来VR/AR系统的第一步,该系统可实时捕获,传送和可视化远程环境。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译
The development of social media user stance detection and bot detection methods rely heavily on large-scale and high-quality benchmarks. However, in addition to low annotation quality, existing benchmarks generally have incomplete user relationships, suppressing graph-based account detection research. To address these issues, we propose a Multi-Relational Graph-Based Twitter Account Detection Benchmark (MGTAB), the first standardized graph-based benchmark for account detection. To our knowledge, MGTAB was built based on the largest original data in the field, with over 1.55 million users and 130 million tweets. MGTAB contains 10,199 expert-annotated users and 7 types of relationships, ensuring high-quality annotation and diversified relations. In MGTAB, we extracted the 20 user property features with the greatest information gain and user tweet features as the user features. In addition, we performed a thorough evaluation of MGTAB and other public datasets. Our experiments found that graph-based approaches are generally more effective than feature-based approaches and perform better when introducing multiple relations. By analyzing experiment results, we identify effective approaches for account detection and provide potential future research directions in this field. Our benchmark and standardized evaluation procedures are freely available at: https://github.com/GraphDetec/MGTAB.
translated by 谷歌翻译